Speech recognition for huge vocabularies by using optimized sub-word units
نویسندگان
چکیده
This paper describes approaches for decomposing words of huge vocabularies (up to 2 million) into smaller particles that are suitable for a recognition lexicon. Results on a Finnish dictation task and a flat list of German street names are given.
منابع مشابه
Speech recognition using sub-word units dependent on phonetic contexts of both training and recognition vocabularies
This paper proposes a new speech recognition algorithm using a new context-dependent recognition unit design method for e cient and precise acoustic modeling. This algorithm uses both training and recognition vocabularies to select context-dependent units which precisely represent acoustic variations due to phonetic contexts in a recognition vocabulary. An e cient training algorithm for selecte...
متن کاملSub-word modeling for automatic speech recognition
Modern automatic speech recognition systems handle large vocabularies of words, making it infeasible to collect enough repetitions of each word to train individual word models. Instead, large-vocabulary recognizers represent each word in terms of sub-word units. Typically the sub-word unit is the phone, a basic speech sound such as a single consonant or vowel. Each word is then represented as a...
متن کاملWhich units for acoustic and language modeling for Khmer automatic speech recognition?
In this paper we present an overview on the development of a large vocabulary continuous speech recognition system for Khmer language. Methods and tools used for quick language resources collection for the development of an ASR system for a new under-resourced language are presented. Face with the problem of lack of text data and the word error segmentation in language modeling, we investigate ...
متن کاملTHE JOHNS HOPKINS UNIVERSITY Sub-Lexical and Contextual Modeling of Out-of-Vocabulary Words in Speech Recognition
Large vocabulary speech recognition systems fail to recognize words beyond their vocabulary, many of which are information rich terms, like named entities or foreign words. Hybrid word/sub-word systems solve this problem by adding sub-word units to large vocabulary word based systems; new words can then be represented by combinations of subword units. We present a novel probabilistic model to l...
متن کاملتشخیص دستنوشتۀ برخط فارسی با استفاده از مدل زبانی و کاهش قوانین نگارش کاربر
The Joint-up, cursive form of Persian words and immense variety of its scripts, also different figures of Persian letters depending on their sitting positions in the words, have turned the Persian handwritings recognition to an intense challenge. The major obstacle of the most often recognition ways, is their inattention to sentence contexture which causes utilizing of a word with correct appea...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001